Skip to content

Conversation

@davidsbatista
Copy link
Contributor

@davidsbatista davidsbatista commented Jan 5, 2026

Related Issues

Proposed Changes:

  • count_documents_by_filter() - count documents matching filter criteria
  • count_distinct_values_by_filter()- get distinct value counts for metadata fields with optional filtering
  • get_fields_info() - retrieve field type information from index mapping
  • get_field_min_max() - get min/max values for numeric metadata fields
  • get_field_unique_values() - get unique values for a field with pagination and content-based filtering
  • query_sql() - execute SQL queries against OpenSearch with support for multiple response formats (JSON, CSV, JDBC, RAW)

How did you test it?

  • added integrations tests covering the new methods both or sync and async versions

Notes for the reviewer

  • added httpx>=0.28.1 dependency
  • the query_sql() method performs a raw http request (based on httpx) if the specified response format is not JSON

Checklist

@github-actions github-actions bot added integration:opensearch type:documentation Improvements or additions to documentation labels Jan 5, 2026
@davidsbatista davidsbatista changed the title Feat/add count filtering to open search document store feat: adding count with filtering operations to open search document store Jan 5, 2026
@davidsbatista davidsbatista changed the title feat: adding count with filtering operations to open search document store feat: adding count with filtering operations to OpenSearchDocumentStore Jan 5, 2026
@davidsbatista davidsbatista marked this pull request as ready for review January 6, 2026 11:16
@davidsbatista davidsbatista requested a review from a team as a code owner January 6, 2026 11:16
@davidsbatista davidsbatista requested review from sjrl and removed request for a team January 6, 2026 11:16
@sjrl sjrl requested a review from tstadel January 7, 2026 08:36
@sjrl
Copy link
Contributor

sjrl commented Jan 7, 2026

Hey @tstadel I'd also appreciate your review on this since we want to make sure it will in platform as well.

davidsbatista and others added 17 commits January 13, 2026 14:27
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
@davidsbatista davidsbatista requested a review from sjrl January 13, 2026 14:57
davidsbatista and others added 2 commits January 14, 2026 10:46
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <[email protected]>
Comment on lines +1292 to +1299
This method would return:
{
'content': {'type': 'text'},
'category': {'type': 'keyword'},
'status': {'type': 'keyword'},
'priority': {'type': 'long'},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the return type of this function get_metadata_fields_info will be specific to OpenSearch. Whereas all other functions you've added that we are looking to add across all Document Stores have a format we created.

Do you think it's possible or reasonable for us to try and standardize something like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a transformation inside each DocumentStore get_metadata_fields_info so that it returns in this format. When this PR is merged I will adapt the issues: noting this + renaming the function names and signature.

assert len(draft_docs) == 1
assert draft_docs[0].meta["category"] == "B"

def test_count_documents_by_filter(self, document_store: OpenSearchDocumentStore):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for this PR, but could we open an issue to move these standardized tests into haystack.testing.document_store as testing modules? I think this would help reduce the duplicate test code we have in all of our document store integrations.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch yes, that's a good idea 👍🏽

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2747

The only issue I'm seeing with this is the parameters of the tests when tied to a specific DocStore. We probably can extend the DocStore Protocol and add a few more operations, if all of our document stores already support them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:opensearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add the following operations to OpenSearchDocumentStore

3 participants